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Abstract 

Oh ■ We develop a Bayesian statistical model and estimation methodology based on Forward Pro- 

\ jection Adaptive Markov chain Monte Carlo in order to perform the calibration of a high- 

dimensional non-linear system of Ordinary Differential Equations representing an epidemic 
model for Human Papillomavirus types 6 and 11 (HPV-6, HPV-11). The model is compart- 
mental and involves stratification by age, gender and sexual activity-group. Developing this 
model and a means to calibrate it efficiently is relevant since HPV is a very multi-typed and 

■ common sexually transmitted infection with more than 100 types currently known. The two 
. types studied in this paper, types 6 and 1 1, are causing about 90% of anogenital warts. 

We extend the development of a sexual mixing matrix for the population, based on a formu- 
fTj ■ lation first suggested by Garnett and Anderson. In particular we consider a stochastic mixing 

qq . matrix framework which allows us to jointly estimate unknown attributes and parameters of 

the mixing matrix along with the parameters involved in the calibration of the HPV epidemic 
model. This matrix describes the sexual interactions between members of the population un- 
der study and relies on several quantities which are a priori unknown. The Bayesian model 
developed allows one to estimate jointly the HPV-6 and HPV-11 epidemic model parameters 
^ ' such as the probability of transmission, HPV incubation period, duration of infection, duration 

■ of genital warts treatment, duration of immunity, the probability of seroconversion, per gen- 
der, age-group and sexual activity-group, as well as unknown sexual mixing matrix parameters 
related to assortativity. 

Finally, we explore the ability of an extension to the class of adaptive Markov chain Monte 
Carlo algorithms to incorporate a forward projection simulation strategy for the ordinary dif- 
ferential equation state trajectories. Efficient exploration of the Bayesian posterior distribu- 
tion developed for the ODE parameters provides a challenge for any Markov chain sampling 
methodology, hence the interest in adaptive Markov chain methods. We conclude with simula- 
tion studies on synthetic and actual data from studies undertaken recently in Australia. 
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1 Background on Modelling HPV and Relevance to Commu- 
nity Health. 

The human papillomaviruses (HPV) are a family of small DNA viruses that preferentially infect 
differentiating epithelial cells of the skin and mucosae. More than 100 HPV genotypes have thus far 
been identified, classified according to their tissue tropism (mucosal or cutaneous) and oncogenic 
potential (high or low). About 40 HPV types are known to infect the mucosae, including those 
of the anogenital and oral tracts, and 13-18 of these are considered to be oncogenic (high-risk) on 
the basis of their association with malignancies. Low-risk HPV types are associated with benign 
lesions such as genital warts and low-grade intraepithelial neoplasias of the cervix [41, 55]. Sexual 
contact is the primary mode of transmission [12] and HPV is the most common sexually transmitted 
infection in the world. HPV is known to be the causal factor in the vast majority of cervical cancer 
cases and is also implicated in a proportion of other anogenital cancers and cancers of the head and 
neck. The overall burden of disease attributed to HPV, both cancers (as much as 5.2% of incident 
cancers worldwide) and benign lesions such as genital warts, is considerable [39]. 

Two vaccines have been developed and shown through clinical trials to be highly effective in 
the prevention of precancerous lesions and persistent infection due to an important subset of HPV 
types [44, 54]. The quadrivalent vaccine (Gardasil) protects against high-risk HPV types 16 and 18 
that are associated with 70-75% of cervical cancers, and against low-risk HPV types 6 and 1 1 that 
cause more than 90% of genital warts. The bivalent vaccine (Cervarix) provides protection against 
HPV types 16 and 18 only. Both vaccines have been licensed in more than 100 countries and 
publicly funded national immunisation programmes have commenced in some of these including 
Australia (Gardasil) [19]. 

National immunisation programmes are costly and decisions regarding their implementation are 
generally made on the basis of health-economic evaluations. In regard to HPV, these decisions are 
complicated by two related factors: 1) HPV is a sexually transmitted infection; and 2) only a small 
proportion of infections do not resolve and can lead to cancer many years or decades subsequent to 
acquisition. Both of these factors have generally been addressed by employing models to estimate 
the long-term impact of vaccination on the incidence of HPV-related disease so that the costs and 
benefits can be calculated. However, it is the former of these factors that is of particular relevance 
in the context of this study. Because HPV is an infectious disease, the rate of transmission in a 
population, commonly referred to as the "force of infection", is a function of the prevalence of 
the infection in the population at any given time [1]. Furthermore, the benefit of vaccination that 
confers immunity to infection (immunisation) is not confined only to those directly immunised — 
"unvaccinated individuals" (who remain susceptible to infection) enjoy a degree of indirect protec- 
tion because their risk of exposure is reduced through a diminishment in circulating virus. This 
indirect benefit of vaccination is referred to as "herd immunity" [20]. In order to model the impact 
of vaccination on the course of the HPV epidemic over time, in a manner that captures the herd 
immunity effect, we must use dynamic transmission models [9, 16]. Failure to do so can result in 
an underestimation of the potential benefit of vaccination. 

Dynamic mathematical transmission models have been used extensively to estimate the poten- 
tial impact of HPV vaccination in a wide variety of settings (e.g., [7, 13, 18, 32, 45], not compre- 
hensive) and as a component of cost-effectiveness evaluations that have informed decisions on the 
funding of vaccination programmes (e.g., [8, 34, 36, 38, 40, 52], not comprehensive). Transmission 
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models have traditionally been formulated in a deterministic framework as systems of differential 
equations (ordinary or partial) [1,35,59]. With the increasing power of personal desktop computers 
and access to high performance computing facilities, the use of agent-based stochastic modelling 
approaches has become more prominent (examples for HPV include [11, 57]). The latter approach 
is particularly useful for low prevalence infections where there is a possibility of extinction and/or 
where it is necessary to capture events that occur at the level of the individual (e.g., tracing and 
treating sexual partners of infected individuals). However, for their computational efficiency, ana- 
lytical tractability and ability to provide mechanistic insights to epidemic dynamics, deterministic 
ordinary differential equation (ODE) models are often preferred, particularly for endemic infec- 
tions such as HPV [47]. We have previously developed a deterministic single-type transmission 
model for HPV- 16 [45] and more recently multi-type models for HPV types 6, 11, 16 and 18 in 
order to evaluate the potential impact of vaccination. In this paper we discuss a novel formulation 
of a model for HPV-6 and -11 that incorporates Australian data on genital warts incidence and 
type-specific seroprevalence. 

In this study, we develop a Bayesian statistical model and estimation methodology to perform 
the calibration of a high-dimensional non-linear system of ordinary differential equations repre- 
senting an epidemic model for HPV types 6 and 1 1 . While the health and economic consequences 
of HPV- 16 and -18 are more serious than for types 6 and 1 1 (hence their inclusion in both currently 
available vaccines), a model for types 6 and 11 was chosen for this study because of the availabil- 
ity of Australian genital warts incidence [43] and seroprevalence [42] data that could be used for 
calibration. 

Despite extensive study, there remains considerable uncertainty regarding aspects of the natural 
history of HPV and the patterns of sexual behaviour that underpin transmission [46, 58]. Further- 
more, many studies have not been designed with transmission models in mind, so the processes 
and phenomena they measure cannot always be applied directly and/or their interpretation in the 
context of transmission is not clear. Areas of uncertainty that present particular challenges for mod- 
elling include interpretation of vaccine efficacy from clinical trials in the context of transmission, 
the duration of infectiousness (as opposed to the duration of detectability using currently available 
tests), the duration and nature of naturally acquired immunity, the relationship between seropositiv- 
ity and immunity, and the probability of transmission on sexual contact. Some of these are difficult 
to measure at a population level for practical and/or ethical reasons. We demonstrate a Bayesian 
statistical methodology that will address these uncertainties in the estimation and calibration of the 
ODE epidemic model we have developed. This methodology allows us to statistically quantify the 
extent of the uncertainty in model outcomes that are derived from uncertainty in the inputs, and the 
contribution of uncertainty in individual parameters to the uncertainty in the outcomes [51, 31]. It 
can also be used to predict the impact of vaccination on a population. 

1.1 Introduction to Adaptive Markov chain Monte Carlo 

Markov chain Monte Carlo (MCMC) sampling has gained a wide recognition in all areas of mod- 
elling and statistical estimation as an essential tool for performing inference in Bayesian models 
(see reviews and discussions in [28] and [10]). In this paper we consider the recently developed 
class of algorithms known as Adaptive MCMC (see a review in [3]), and demonstrate how they 
may be extended to solve statistically challenging estimation and prediction problems of direct rel- 
evance to the interpretation and analysis of the calibration and vaccine response dynamics for HPV 
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epidemic models. We illustrate this on the model we develop for HPV-6 and -11. 

Standard MCMC algorithms that do not incorporate adaptation often require a degree of "tun- 
ing" of the parameters controlling the algorithms performance. This is typically performed by 
off-line simulations to assess performance of the mixing of the resulting Markov chain followed by 
numerical investigation of the convergence rates to stationarity of the chain for different algorithmic 
settings of the proposal distribution. For example, the widely used variant of the Metropolis Hast- 
ings algorithm, the Random Walk Metropolis algorithm has mixing performance that is controlled 
through specification of the Markov chain proposal distributions covariance matrix. Tuning this 
matrix for optimal performance can be computationally expensive and inefficient. Optimal perfor- 
mance of an MCMC algorithm is typically either specified by the convergence rate of the Markov 
chain to stationarity or through the related quantity, the acceptance probability of the rejection step 
in the MCMC algorithm. In this regard, theoretically optimal results have been derived for several 
classes of statistical models, which now act as guides for more complicated sampling problems 
(see discussions in [50]). 

In this paper, the ODE HPV epidemic model is constructed on a high dimensional space both 
in the parameters of the model and also in the latent ODE state trajectories solved for at each dis- 
crete time point in the "forward projection" ODE solver. This high dimensionality in the posterior 
parameter space provides a significant challenge for standard MCMC algorithms with respect to 
the design of an efficient proposal mechanism for the Markov chain. In particular, in the model 
considered in this paper the fact that we incorporate a forward projection stage for the ODE solver 
adds additional complications in the design of the proposal. Therefore, it is desirable to automate 
this proposal construction for the MCMC sampler, avoiding computationally expensive tuning pro- 
cesses. Hence, we develop an adaptive version of the Random Walk Metropolis algorithm, coupled 
with forward projection. The incorporation of an adaptive proposal mechanism in an Markov chain 
Monte Carlo algorithm has been demonstrated to improve the performance of the sampling algo- 
rithm relative to standard MCMC approaches, see reviews of several examples of this improvement 
in ??. This improvement is achieved by learning on-line the structure of the Markov chain proposal 
distribution in an automated fashion, avoiding tuning of the MCMC proposal mechanism. 

There are several classes of adaptive MCMC algorithms and each class has several adaptation 
strategies [6, 2]. These approaches can be classified as either internal adaptation mechanisms, in- 
cluding controlled MCMC methods or external adaptation strategies (see discussion in [6]). The 
distinguishing feature of adaptive MCMC algorithms, when compared to standard MCMC, is that 
the Markov chain is generated via a sequence of transition kernels. Adaptive algorithms get their 
name from the fact that they utilise a combination of time or state inhomogeneous proposal ker- 
nels. Each proposal in the sequence is allowed to depend on the past history of the Markov chain 
generated, resulting in many possible variants. When using inhomogeneous Markov kernels it is 
particularly important to ensure the generated Markov chain is ergodic, with the appropriate sta- 
tionary distribution. Several recent papers proposing theoretical conditions that must be satisfied to 
ensure ergodicity of adaptive algorithms include [6] and [30]. The paper [49] proves ergodicity of 
adaptive MCMC under conditions known as Diminishing Adaptation and Bounded Convergence. 
Designing an adaptation strategy that satisfies these conditions guarantees asymptotic convergence 
of the law of the Markov chain samples to the target posterior and ensures the Weak Law of Large 
Numbers holds for bounded test functions of the parameter space (an interested reader is referred 
to [49] for details). In this paper we work with a transition kernel that is well known to satisfy these 
conditions and has been used successfully in several applications, based on the adaptive Metropolis 
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algorithm. 



1.2 Contributions 

In this paper we formulated a model for the HPV types 6 and 1 1 which adequately covers all aspects 
of the disease transmission and treats the incorporated seropositivity as a state associated with an 
individual's recovery. Our aim was to ensure the model structure is minimalistic yet sufficient 
for testing whether this particular interpretation of seropositivity agrees with the available data the 
model is calibrated to. 

Using this deterministic ODE model we undertake three key tasks: construction of a robust 
statistical modelling framework under a Bayesian paradigm to perform calibration of this coupled 
ODE transmission model with extensions to a stochastic population mixing matrix formulation; de- 
velopment of an automated statistical estimation methodology, based on a modification to adaptive 
MCMC to incorporate forward projection for the ODE, that will provide a robust means of perform- 
ing calibration and statistical analysis of the calibration performance; and a statistical methodology 
to study vaccination responses based on the posterior predictive distribution we estimated via adap- 
tive MCMC. 

We perform detailed studies on synthetic data to assess the properties of the model and adaptive 
MCMC forward projection methodology. This is followed by assessment of the calibration perfor- 
mance on real data collected from Australian sources (genital warts incidence [43], and HPV-6 and 
-11 type- specific seroprevalence [42]). 

2 Human Papillomavirus (HPV) Transmission Model 




Figure 1: A compartmental HPV-6/- 1 1 transmission model (see Table 1). 



5 



We consider a dynamic transmission model for HPV types 6 and 11 (Figure 1). This model 
is compartmental, such that the entire population is viewed as being distributed between a set 
of non-overlapping groups ("compartments") representing the stages of disease progression. The 
compartments are specified as presented in Table 1. The model is intended to describe how the 
number of people in each compartment changes over time. For example, members of the sus- 
ceptible population 'move' from S to / as they become infected, and members of the recovered 
seropositive population 'move' from P to S as they lose their immunity. 

Underlying Epidemic Model Assumptions 

Here we briefly discuss the key assumptions the model is based upon. 

Population (a) the modelled population consists of people aged 15-59 y.o. who are divided into 9 
separate five-year age-groups (see Table 8). Limiting the population to this particular age 
range is motivated by a presumed absence of sexual activity in people younger than 15 
y.o. and people over 59, though any extensions to these ranges is not precluded by our 
methodology; five-year age-groups are commonly used for reporting results of surveys and 
trials including the one providing the sexual behaviour data for our model (see, for example, 
[43, 42, 29]); (b) the modelled population is constant over time; consequently, immigrants 
and temporary visitors are not accounted for, which may be an important simplification for 
Australia whose population has been steadily growing due to immigration; furthermore, Aus- 
tralia is a popular destination for young travellers who often maintain a high level of sexual 
activity; (c) mortality, though formally implemented, is not caused by HPV infection but 
rather serves as a convenient way of removing individuals who do not contribute to the HPV 
transmission due to advanced age signified by a cessation of sexual activity; (d) the num- 
ber of males in the whole population and every sexual activity or age-group is equal to the 
number of females; (e) no transition between genders is allowed (i.e. males can not become 
females and vice versa); 

Sexual behaviour (a) the modelled population is heterosexual with all people belonging to one of 
four sexual activity (risk) groups (group one being the least active and group four the most 
active); the proportions of the population in each risk-group 1-4 are 0.6, 0.27, 0.1 1 and 0.02, 
respectively [53, 45] (also see Table 9); (b) people are born into a particular risk-group and 
can never leave this group but their activity level is a function of their age; this restriction 
implies, for example, that even when someone from the most active group gets older, his or 
her activity level declines but does not drop to the level of representatives of a less active 
group of the same age; (c) the annual sexual partner change rate for an individual is fully 
defined by his or her age and risk-group; the manner in which an individual of a given age, 
gender and risk-group "chooses" partners of the opposite gender and of a given age and 
risk-group is described by means of a sexual mixing matrix (discussed in section 2.1); 

HPV transmission and seropositivity (a) we assume people seek treatment immediately upon 
becoming aware that they have genital warts; (b) we associate seropositivity and seroneg- 
ativity exclusively with the recovered state such that only those in the recovered (immune) 
state can have the status of seropositive/seronegative.This status is lost upon removal from 
the recovered state to the susceptible state (i.e., loss of immunity). In the context of our 
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model the recovered/seropositive state corresponds to those people who have developed de- 
tectable antibodies (using currently available tests) to to HPV-6/-1 1 through an immunogenic 
response. Conversely, the recovered/seronegative state corresponds to those who have recov- 
ered and are immune to reinfection but have not developed detectable antibodies. In general, 
seropositivity can serve as a long-lasting marker of ongoing or prior infection, though not 
a particularly reliable one in the case of HPV as only a proportion of those exposed to in- 
fection develop detectable antibodies [42]. Furthermore, there is some evidence supporting 
an alternative notion of seropositiviy to the one we have assumed, whereby seropositivity in 
only a marker of previous infection and is not correlated with protection against reinfection 
(see [56]). Such a perspective would lead to a more complicated model structure and we 
therefore do not focus on it in this paper. However, the methodology we develop can easily 
be extended to this context. 

Formulation of the model as a system of ODEs 

Our model is formulated as the following system of ordinary differential equations (ODEs): 

^g,s,a ^g,s,a\t}^g,s,a Cgi.Pg,s,a ^g,s,a) g ^>g,s,(a— 1) ~^^g,s,a 

4q ^2( S a,s,9 + I g ,s,9 + Gg yS)9 + P 9iS)9 + N g>S!% )8 1 (a) X (1) 
g,s 

(0.6<Si(s) + 0.2 75 2 (s) + 0.11<J 3 (s) + 0.02<J 4 (s)), (2) 

Ig,s,a •^g,s,a(^)'5g,s,a \1g Pg)^ S)S,a ~^-g,s,{a— 1) ~^Ig,s,ay (3) 

Gg^s,a ^gGg^s,a ~\~ g"ff,s,(o— 1) g S a) (4) 

Pg,s,a ^g(^Pg^g,s,a ^gGg : s,a) CgPg,s,a ~^Pg,s,(a—l) ~^Pg,s,ay (5) 

Ng,, a = + (6) 

Here the capital letters denote the number of people in a compartment and the subscripts denote 
gender (g; for males g — 1, for females g = 2), one of the four sexual activity-groups mentioned 
previously (s G {1, . . . , 4}), and an age-group (a G {1, . . . , 9}); the dot denotes a derivative with 
respect to time; 5\ (a) is the Kronecker delta function, equal to 1 if a = 1 or otherwise, and the 
system coefficients should be interpreted as described in Table 2. It should be emphasised that all 
coefficients in the model formulation are gender specific, i.e., they can take different values for 
males and females. System (2)-(6) contains a number of terms describing the process of aging. 
Since each age-group comprises a five-year band, members of the population age (i.e. move to 
the next age-group) at a yearly rate of 1/5. To maintain a constant population size we assume that 
there is an inflow of people into the susceptible compartment of the youngest age-group (group 1) 
as defined by: 

^ $^(^,9 + I g ,sfi + G g ^ 9 + P g>s>9 + N g , 8fi )5 1 (a)(0.65 aA + 0.275 2 (s) + O.U5 3 (s) + 0.025 4 (s)). 
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This term is obtained by dividing the total number of individuals leaving the oldest age-group 
(group 9) each year on reaching age 60, (S g>S)9 + I g>S)9 + G 9)Si9 + P 9iSt9 + N g>S)9 )/5, evenly between 
2 genders and 4 sexual activity-groups. To every g and s is added S 9)S ^ according to the previously 
defined distribution of the population across risk- groups (see Table 9). The implementation of 
aging is a mechanism for people to enter and leave the sexually active population continuously and 
is necessary to propagate the effect of vaccination: we must ensure that vaccination of individuals 
in a particular age-group will later contribute to the number of vaccinated in older age-groups. 

Each of the equations (2)-(6) describes the change in the number of individuals that occurs 
during a small time period as the sum of the number of individuals entering this compartment 
from other compartments and those leaving the compartment. Discussions on construction of com- 
partmental disease transmission models are presented in [35] and [59]. Consider, for example, 
compartment G (Figure 1): we can calculate the change in the number of individuals in this com- 
partment during a small interval of time by adding up the individuals entering the compartment 
during this time interval (the infected who developed genital warts, ^ g I g , s ,a and the aging mem- 
bers of G from age-group a — 1, G 9:S , a /5 ) and subtracting the number that leave the compartment 
(recovered who go either to P or N, r g G g ^ ta and the aging members of G moving to age-group 
a + 1, G fl)S)0 /5). In so doing, we will obtain 7 g I g>s , a + Gg^^/h - r g G g ^ a - G 9iS>a /5 which is 
the right-hand side of equation (4). 

It is necessary to point out the crucial role of the force of infection in our model (Table 2). 
This is the only non-constant coefficient we have to deal with, which introduces non-linearity into 
the system (2)-(6). It depends on the age and risk-group specific patterns of sexual behaviour 
represented by a matrix usually known as a 'sexual mixing' matrix [1]. 

In the following subsection we provide a concise description of the construction of the sexual 
mixing matrix. For complete details we refer the reader to Appendix 1 and an associated research 
paper [24]. 

2.1 Sexual Mixing Matrix 

In this section, we discuss the model choice, the construction and extensions we made in this paper 
to the development of a statistical model for the sexual interaction of members of the population, as 
defined by the sexual mixing matrix. We consider a Sexually Transmitted Infection (STI) transmis- 
sion model which has many features in common with other STI models. For example the models 
for gonorrhea in [26] or those developed for HIV in [24] and [23], which describe patterns of mix- 
ing between age and sexual activity-groups with respect to HIV in heterosexual communities. Like 
these other models, our approach relies on certain assumptions about the way individuals form 
their sexual partnerships. This partnership formation process is commonly referred to as "sexual 
mixing". A simplified model for sexual mixing can be described via a sexual mixing matrix as 
described in [1], and for which examples can be found in [24], [22], [25] and [27]. 

It is not uncommon in the medical literature to assume the parameters of this mixing matrix 
to be known throughout the calibration of the resulting ODE epidemic model (see [22]). This 
is because these parameters are normally derived from the data obtained from an extensive sexual 
behaviour survey which often serves as the only comparatively reliable source of information on the 
subject. Therefore, if the number of participants in a particular survey is significantly larger than in 
other surveys, this survey will likely be considered as the most trustworthy source of information on 
sexual mixing, often irrespective of the experimental design and population studied. However, due 
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to the personal nature of such surveys it is understood that their results are to be taken cautiously. 
In our model we will assume that two of the parameters specifying the sexual mixing matrix are 
unknown and should be jointly estimated with the ODE model parameters. 

In our heterosexual model formulation, the mixing matrix is a (2 x 4 x 4 x 9 x 9) dimensional 
matrix comprised of the product terms c fl;SiS / )0;a /p SjSjS / iaia /, where c fl;S y j0)(I > is the mean per capita an- 
nual rate at which an individual of gender g from a risk or activity-group s and age-group a acquires 
new sexual partners of the opposite gender g' from a risk-group s' and age-group a'; p g:S ,s',a,a> is the 
conditional probability that an individual of gender g from sexual activity-group s and age-group a 
acquires a sexual partner of the opposite gender g' from sexual activity-group s' and age-group a'. It 
is clear that estimation of all of these parameters is an almost insurmountable statistical challenge, 
which is one of the reasons why these parameters are often taken as fixed in any given calibration 
study of STI transmission models. There are two broad approaches one could pursue, the first, 
given we are working in a Bayesian modelling framework in this paper would involve prior elicita- 
tion for these population parameters based on expert opinions of annual interaction rates that would 
be reasonably understood by medical practitioners in sexual health clinics, sexual health workers 
and social workers in regions in which respondents were recorded. The other alternative involves 
re-parameterizing aspects of this matrix, simplifying it significantly, allowing one to account for 
the uncertainty associated with specification of this matrix in an appropriate simplified stochastic 
model. This would involve finding suitable factors common to aspects of this matrix that could 
instead be taken as stochastic and estimated in the model calibration, which in turn allow one to 
derive each element of the sexual mixing matrix. Most importantly, the framework we develop and 
present for the estimation and calibration of the transmission model is general enough to be used 
for either of these approaches and any degree of unknown parameters in the sexual mixing matrix 
and any parameterisation deemed suitable for a given population study. 

In this paper, we then utilise this matrix to specify the force of infection (see equations (2) and 
(3)) for any individual from any subgroup (g, s, a) according to the parameterisation 

a , ^gW 1 ( 7 ) 

Sg',s',a' + Ig',s',a' + Gy,s',a' + Pg',s',a' + ^g',s',a' J 

where g is the transmission probability per partnership, i.e. the probability that a susceptible 
person of gender g will become infected due to a partnership with an infected person of gender g' . 
As discussed in [24], we can easily incorporate into specification of (7) several parameters allowing 
us to control to what extent the matrix should reflect assortative partnership formation patterns, i.e. 
how likely are individuals to find sexual partners among 'similar' individuals, thereby altering the 
properties of the matrix, making it more or less dense. For example, we can vary the extent to 
which older males prefer to have younger female sexual partners, or, possibly, a tendency of older 
females to choose younger male sexual partners. It is also necessary to be able to adjust for or 
account for the readiness of each gender to compromise with the wishes of the opposite gender. 
This last point becomes significant whenever supply of individuals of one gender does not meet the 
demand for sexual partners from the individuals of the opposite gender. 

In this paper, we will treat the degrees of assortativity by age and sexual activity-groups, ob- 
served to have a noticeable effect on the model calibration, as uncertain, while the rest of the sexual 
mixing matrix parameters will be fixed. We do not assume that all of the parameters specifying this 
matrix are unknown since we want to keep the total number of parameters in our model as low as 
possible. 
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While a matrix with these features cannot encompass all the complexities of human sexual 
mixing, it certainly enables us to explore various relatively plausible mixing scenarios. In addition 
it is a manageable model formulation and this framework has been widely used by STI modellers 
(for examples of its use in HPV models see [7, 37, 45, 17]). 

3 Bayesian framework for non-linear ODE HPV Epidemic model 

In constructing a statistical model for the HPV epidemic model we treat the parameters of the 
non-linear system of ODE equations describing the epidemic as unknown random variables. In 
addition, we treat some of the parameters of the sexual mixing matrix associated with assortativity 
as unknown random variables to be estimated jointly with the ODE epidemic model parameters. 
We formulate a Bayesian model for this system with the prior specifications given in Table 3. 
Furthermore, we treat the ODE system as purely deterministic, conditional on a set of system 
parameters. In other words, the trajectories of the latent states in the ODE system over time are 
conditionally, deterministically specified by the system of ODEs. Therefore, we do not derive a 
system of stochastic differential equations as has been done in the Pharmaco-Kinetic/Pharmaco- 
dynamics literature (see [14] or [15]). Instead our focus lies purely in the "calibration" of this 
ODE epidemic model to an observed set of data based on observations which are formed by a 
transformation of the state of the ODE system and observed in noise. We detail the process of 
forward simulation we utilise to obtain the state of the non-linear ODE system at any given time 
point, conditional on the model parameters defined in Section 3.1. 

3.1 Forward simulation of the non-linear ODE model 

The states of the system at time points t E {1, . . . , T} are denoted by vectors X g >a s (l : T) = 
\X gA )S (1), . . . , X g A ,s{T)] , which generically represent the dynamic evolution of the ODE sys- 
tem, where for a given gender, age and sexual activity-group at time t the state encompasses the 
following components X g ^ s {t) = [S g>a , iS (t), I gja>s (t),G g>aiS (t), P g , a , s (t), iV w (i)]. Therefore, con- 
ditional on a set of parameters in the ODE system, we can iterate the ODE system forward in time 
using an ODE solver to obtain the states for each population group at time t. Note, the times t, for 
which the system is solved, will generally be of a much finer granularity to those at which observa- 
tions are collected in a population study. This set of times should include, for the minimum subset 
of times for which the system is solved, the observation times. 

In our estimation procedure we considered several different solvers, and noticed that the choice 
of solver can have a significant effect on the accuracy and on the efficiency of the computations 
undertaken in the statistical estimation. In particular, depending on the parametrization and devel- 
opment of the relationships between states or compartments of the model and the parametrization of 
the mixing matrix, one can obtain non-stiff or stiff systems of equations. In such settings repeated 
application of generic solvers will be computationally inefficient. 

The software implementation of the framework described in the paper was written in Fortran 
90/95/2003 and utilised a universal solver dodesol which is a part of the Intel® Ordinary Differ- 
ential Equation Solver Library. This solver is, in fact, a collection of five different solvers designed 
to solve numerically initial value problems of the form 

x = g(x,t), x(t ) = x , t> t , (8) 
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where t is an independent variable, a? is a vector of state variables to be solved for and g(x, t) is a 
function of t and x. 

An important distinction between these solvers is that each of them being of explicit, implicit 
or hybrid type is particularly efficient for a given degree of stiffness of the system. The universal 
solver first estimates how stiff the system is and then employs an appropriate solver to handle it. It 
is pertinent to note that in the case where (8) can be solved explicitly a modification of the fourth 
order Merson's method combined with a first order multi-stage method will be applied. Should (8) 
be treated implicitly, an L-stable fourth order (5, 2)-method with an option to fix the Jacobian will 
be used. 

We chose the dodesol solver not only because of its convenient Fortran interface, for which 
the remainder of the model and code was developed, but also efficient performance which we com- 
pared with that of the two standard MATLAB solvers ode 4 5 and ode 15 s. The solver dodesol 
was especially superior to those provided by MATLAB solvers ode 4 5 and odel5s when the 
stiffness of our ODE system increased. However, it should be pointed out that dodesol speed 
can be considerably affected by the selection of its minimal time step and relative error threshold. 
To obtain the simulation results discussed in this paper we used the minimal step size 1 x 1CT 10 , 
the initial step size 1 x 10~ 6 and the relative error tolerance 1 x 1CT 3 . Also, we did not make use 
of the option to pre-specify the Jacobian provided by the solver. 

3.2 Prior choices for parameters of non-linear ODEs and the sexual mixing 
matrix 

In this section we specify prior model developed for the non-linear ODE system modelling the 
epidemic of HPV as well as the extension to the mixing matrix, making it stochastic. We summarise 
the prior choices made for each of the model parameters below in Table 3, where we have worked 
with a posterior parameterisation based on the input components of the model parameters in Table 
2. Combining these prior choices, we can present the following basic prior structure: 

p(TRm, TRf, WlPrn, WIPf, DWTm, DWTf, DAIm, DAIf, PSCm, PSCf, 

Dim, DIf, E, A Y , EPSa, EPSr) = p(TRm)p(TRf)p(WIPm)p(WIPf)px 
(DWTm)p(DWTf)p(DAIm)p(DAIf)p(PSCm)p(PSCf)x 

p(DIm)p(DIf)p(E) P (A Y )p(EPSa)p(EPSr). (9) 

We also note, that when specifying the prior distributions and their hyper-parameters (i.e. pa- 
rameters describing the shape of distributions assigned to the prior distribution parameters) we 
utilise data reported in the literature from previous studies of HPV transmission to inform some ap- 
propriate prior specifications (see Table 4). Importantly, these data are completely different to the 
real data sets that were studied in the present paper, based on different populations, different time 
periods and different locations, hence we have not used our observation data twice. Furthermore, 
where data were not informative on particular parameters, uninformative prior specifications were 
utilised. As is usually the case with interpreting any reported HPV data, it is appropriate to consider 
the reported values of observations and calibrations as nothing more than an estimation or a trial 
with its own particular limitations and proneness to error. This means we can safely consider values 
outside of the reported confidence intervals to be plausible in our study. Using (4) we specify the 
priors for each of the HPV models (Table 5). 
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The prior parameters for seroprevalence observation errors denoted by A Y had hyper-prior pa- 
rameters given by (^ m , ^Am) — (2, 2) and the diagonal element a of the incidence observation 
error covariance matrix was distributed according to (k a , 6 a ) = (2, 5). Finally, the priors for the 
sexual mixing matrix parameters associated with assortativity had hyper-priors specified according 

to (a £a , PJ = (0.5, 0.7) and (a er , 0J = (0.5, 0.7). 

3.3 Likelihood model for parameters of non-linear ODEs and Garnett mix- 
ing matrix 

Having specified the prior structure of the model we present the observation model, likelihood and 
details of the actual data studied. In particular, we note that conditional on a particular realisa- 
tion of the model parameters in Table 3, the latent unobserved state trajectories of the system are 
deterministic, non-analytic solutions to the system of ODEs in Equations (6). These solution tra- 
jectories are transformed according to age, gender and risk-group to produce a set of results that 
can be directly observed in noise in the population under study. In the following section we will 
describe and detail these transformations and the associated statistical assumptions made on the 
observation error. In doing so we will also specify models suitable for situations of disequilibrium 
of the disease states within the population such as occurs after a serious outbreak, a vaccination or 
community education awareness program, etc., and models for situations in which the disease in 
the population under study can be considered to be at equilibrium. 

In this section we first describe the observations that are utilised in the calibration of the ODE 
epidemic model and the sexual mixing matrix. Importantly, the real observations utilised in this 
paper, based on HPV-6/-1 1 data collected in Australia, will be treated as taken from a population at 
endemic equilibrium. 

Then we present the resulting likelihood model utilised, followed by derivation of the poste- 
rior distribution. We first note that the general likelihood model will be presented in which ob- 
servations of the transformed state trajectories are known at a fixed set of discrete time points 
t G {1, 2, . . . , T}. In practice, these discrete time points may be unevenly spaced and this would 
not affect the estimation procedure to be developed. 

The actual observations we consider in this model are seroprevalence and genital warts inci- 
dence by age-group, which make up the "observations" considered in our model. They are specified 
as below: 

Seroprevalence is the proportion (or percentage, in our model) of individuals in a given group 
who are recovered and seropositive; 

Genital warts incidence is a rate defined as the number of new cases of genital warts in the pop- 
ulation per person (or per 1000 persons, in our model) per year. 

In order to evaluate the output of our model, we intend to compare it with the real-life seropreva- 
lence [42] and genital warts incidence [43] data. The seroprevalence data were collected in the 
second half of 2005 from various laboratories in New South Wales, Victoria and Queensland where 
about 80% of the Australian population resides. A validated sampling method for serosurveillance 
was applied to test 1523 serum samples from females and 1247 samples from males aged from 
to 69 years. For each individual the age-group, sex, and date of sample collection were recorded. 
The overall population HPV seroprevalence was derived by weighting the results obtained from the 
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samples to 2005 Australian midyear population estimates by age. Incidence data were estimated 
from the Bettering the Evaluation of Care and Health (BEACH) cross sectional database. In par- 
ticular, the annualised new case consultation rate stratified by gender and age from April 2000 to 
September 2006 was analysed and normalised to the corresponding 2004 Australian population. 
In total, 639 consultations related to genital warts were registered and of these roughly 35% were 
new cases. BEACH did not capture the people who attended sexual health clinics to seek treatment 
for genital warts. However, it was estimated that for every person visiting a GP there was 0.298 
persons visiting a sexual health clinic, so the genital warts cases managed in such clinics were ac- 
counted for by multiplying the incidence rates obtained from the GP data by 1.298. The data are 
summarised in Tables 6 and 7. 

Generally, observations can be considered in this model as arriving at irregularly spaced inter- 
vals and generally not at all times t G {1, . . . , T}. For simplicity, we will assume that a full panel of 
observations for all age-groups, activity-groups and genders is available at each observation time, 
though this assumption can also be relaxed under the model developed in this paper. We will also 
assume for simplicity that since we can solve for the state of the non-linear system at any specified 
time point, conditional on a set of model parameters. We will always have the observation time 
corresponding to one of the time points t E {1, . . . , T}. 

We model the observation vector as the result of a vector function of the non-linear ODE sys- 
tem state Xg^ S j at each time t, denoted by O fl)0)S (t) = [D g! a,s(t),Yg,a,s(t)] which contains the 
observation D g ^ s (t) representing incidence at time t for a given gender, age-group and sexual 
activity-group as well as Y g a s (t), which represents the seroprevalence in the given category. 

The observed numbers of new diagnoses are assumed to be observed in Gaussian noise. This 
assumption is reasonable since the counts obtained are very large, so it is suitable to make a contin- 
uous distributional assumption. The likelihood model for the observed seroprevalences is assumed 
to be a beta distribution since it represents observations of proportions given the parameters. Fur- 
thermore, conditional on the parameters of the model, the joint likelihood model is assumed to have 
the following conditional independence properties between observations conditional on informa- 
tion on the individuals categories of gender, age and activity-group as follows: 

£ (TR, WIP, GWT, DAI, PSC, DI, E, Ay, EPSa, EPSr, X w (l), . . . , X W (T); 

T 

a„„(i) a„„(v)) II II II n p (°^ , s (*)|tr,wip,gwt, 

ge{l,2} ae{l,...,9} s€{l,...,4} t=\ 

T 

DAI, PSC, DI, E, Ay, EPSa, EPSr, X gA!S (t)) = J] J] J] JJP(D 9AS (t)|TR, 

ge{l,2} ae{l,...,9} se{l,...,4} t=\ 

WIP, GWT, DAI, PSC, DI, £, EPSa, EPSr, X w (t)) x 
P (Y g>at .(t) |TR, WIP, GWT, DAI, PSC, DI, A Y , EPSa, EPSr, X g ^ s (t)) 

T 

= n n n n^ivw^vw-^Mwi^^i^-^- ao> 

S6{1,2} ae{l,...,9} se{l,.„,4} t=l 

where we denote in bold the vectors of parameters corresponding to males and females. For ex- 
ample, TR = (TRm, TRf); the equilibrium mean structure for the number of new genital warts 
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diagnoses (incidence) of HPV-6 and HPV-1 1 in category g, a, s is defined as 



A**W(*) = f (TR, WIP, GWT, DAI, PSC, DI, E, EPSa, EPSr, X w (t)) 

x 1000 



1 I 9 ,s,a (ID 



1 Pg Sg^ s ^ a ~\- Ig.s : a G g : s,a Pg,s,a ^g,s,a 

which is a function of the state vector at time t and the model parameters. The multiplier 1000 
appears here because we compare the simulated incidence with the real incidence data reported per 
1000 individuals [43]. 

In this paper we always consider gender to be male or female coded with a 1 or 2 respectively, 
sexual activity to be split into four classes coded with 1,2,3,4 and ages as decomposed in Table 8 
coded by the indices 1 through 9. It has to be stressed that (11) is valid only under the assumption 
that our model is at endemic equilibrium. Otherwise, the incidence would be time dependent: for 
any two time points t and ti such that t\ = t + 1 (they are 1 year apart) would be given by the 
following: 

1 Ig s 5 a(^o) ^ 

= WIPg X S g , s>a (t ) + i^S) + G g ,s, a (t ) + Pg,sM + Ng tS , a (t ) X l000 ' a } 

Note, we have obtained the state vectors for the system X g>a >s (t) used in the evaluation of the 
likelihood model via forward simulation, conditional on the model parameters of the non-linear 
ODE model, ensuring that the solution points at which we evaluated the states of the system at 
least corresponded to the observation times t E {1, . . . , T} and generally would be a finer grid than 
these time points. This was achieved using a specialised ODE solver as described previously in 
Section 3.1. 

Additionally, we also define the observation equation for the likelihood corresponding to the 
seroprevalence observations for each age-group, activity-group and gender, as a function of the 
non-linear ODE system state and model parameters, which were specified according to a Beta 
distribution specified in Equation (13) below: 

Yg,a, s (t) = 9 (TR, WIP, GWT, DAI, PSC, DI, A Y , EPSa, EPSr, X g ^ s (t)) 

iW (13) 

Sg,s,a Ig,s,a Gg S a -\- Pg^s,a ^g,s,a 

The scale parameter A Y of this Beta prior is treated as a random variable and estimated in the 
posterior inference. The shape parameter, denoted by B Y is obtained by moment matching given 
the sampled scale parameter and the observations obtained from the current forward projection 
trajectory: 



B * ^ * [ wi) " V ' (14) 

We may then combine this likelihood and prior structure to obtain the posterior distribution of 
the model parameters, given the observations. The full posterior distribution of interest is given in 
Equation (15) and is comprised of the likelihood model in Equation (10) and the prior model in 
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Equation (9): 

p (TR, WIP, GWT, DAI, PSC, DI, S, Ay, EPSa, EPSr\O g , a , s {l), . . . , O w (T)) 

oc C (TR, WIP, GWT, DAI, PSC, DI, S, Ay, EPSa, EPSr, . . . , X ff)a>a (T); 

O g>aiS (l), 9tatS (T))p(TRm)p(TRf)p(WIPm)p(WIPf)p(DWTm)p(DWTf) P x 
(DAIm)p(DAI f)p(PSCm)p(PSC f)p(DIm)p(DI f)p(J:)p(AY)p(EPSa)p(EPSr). (15) 

The resulting posterior distribution therefore has 16 model parameters. At each time point t, the 
forward simulated ODE state vectors X(t) E M 360 , due to 2 sexes, 9 age-groups, 4 activity-groups 
and 5 compartments. Hence, the total dimension explored in the simulation performed in the results 
section is 360 x T, where we set T = 120. 

Next we demonstrate how to perform inference under this Bayesian model formulation. In 
particular we are interested in the Bayesian point estimators corresponding to the posterior mean 
(MMSE) and the posterior mode (MAP) estimates, as well as the distribution properties of the pos- 
terior. To explore these we must resort to a numerical procedure to draw samples from the posterior 
distribution. The class of methods we consider in this paper involves combining the forward sim- 
ulation procedure for solving the non-linear ODE system given a set of model parameters, with an 
adaptive Markov chain Monte Carlo (AdMCMC) algorithm to propose a new set of model parame- 
ters given past history of proposed parameter vectors. The details of the AdMCMC methodologies 
considered are presented in Section 4. 

4 Adaptive MCMC Strategies Combined with Forward Simu- 
lation 

In this section we present details for the adaptive Markov chain Monte Carlo (AdMCMC) algo- 
rithm that will be combined with the Forward simulation algorithm in order to sample from the 
posterior distribution given in Equation (15). In particular, we first introduce the background of an 
AdMCMC algorithm, before detailing the adaptation strategy we will explore in this paper based 
on the adaptation rules developed in [6, 49]. 

In essence, we first construct a MCMC proposal distribution to sample the static posterior pa- 
rameters, denoted by vector = [TR, WIP, GWT, DAI, PSC, DI, S, A Y , EPSa, EPSr}. Then, 
conditional on these model parameters proposed, we obtain the state trajectories for the ODE HPV 
epidemic model, given by X 9>a s (l : T) = [X 5j0)S (l), . . . , X g ^ s (T)\, which are generated us- 
ing forward simulation. The MCMC proposal distribution is parameterised by parameter vector 
<I>. The idea of adaptive MCMC methods is to learn appropriate values for <I> recursively utilising 
the previous samples of the Markov chain that have been accepted under the MCMC accept-reject 
mechanism. This is achieved on-line, adapting according to the support of the posterior distribution, 
thereby allowing the Markov chain to discover and explore the regions of the posterior distribution 
that have the most mass. Through this on-line adaptive learning mechanism the Markov chain 
proposal distribution can significantly improve the acceptance rate of the Markov chain, enabling 
efficient mixing and improving the samples obtained from the posterior. 

In this paper we consider a popular adaptation scheme proposed in the literature and analyse its 
ability to explore the very high dimension of the posterior distribution developed in the non-linear 
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HPV epidemic model. Before presenting the details of the adaptation scheme for the Markov pro- 
posal, we first present the generic algorithm developed for sampling from the posterior in Equation 
(15). In the following sequence of steps for the j-th iteration of the AdPMCMC Forward Simu- 
lation algorithm, we will update the state of the Markov chain from Q^" 1 ', with corresponding 
states Xg t a,}\l ■ T), to parameter vector @( J ) with associated state trajectories X g Xs(l ■ T). This 
algorithm is summarised below for one step of the AdMCMC Forward project algorithm: 

1. Sample 0* ~ q([0](j — 1), ■) from an Adaptive MCMC proposal constructed using previous 
Markov chain samples {©W, . . . , 

2. Solve the non-linear ODE system (2)-(6), by running a forward simulation of the non-linear 
ODE solver, conditional on proposed parameter vector 6*, to obtain the state of the ODE 
system, . . . , X g ^ s (T), which correspond to the observations 

o g>a , s (i),.'.:,o g , a , s (T). 

3. Accept the proposed new Markov chain state comprised of 0* with acceptance probability 
given by 



a 



1 ' ] \ , C(eU-^O g , aia (l),...,O 9ia , a (T))p(9^))q(0*,0^) 



where we evaluate this acceptance probability utilising the expressions developed in Section 
3.2, Section 3.2 and Section 4.1. These steps are repeated for j e {1, . . . , J}. 



4.1 Adaptive Metropolis 

To complete the specification of the methodology utilised, we present the internal adaptation strat- 
egy we considered in this paper based on the adaptive Metropolis algorithm detailed in [49]. This 
is a variant of the approach proposed in [30] which develops a Random Walk Metropolis-Hastings 
(RWMH) that estimates the global covariance structure from the past samples. 

Under an Adaptive Metropolis algorithm, the proposal distribution is based on a Gaussian mix- 
ture kernel detailed in [49]. The proposal, q (O^" 1 ^, 0*), involves an adaptive Gaussian-mixture 
Metropolis proposal, one component of which has a covariance structure that is adaptively learnt 
on-line as the algorithm explores the posterior distribution. For iteration j of the Markov chain the 
proposal is 

Qj (0^\ ■) = 7 A/- (r ; ^f^Sij + (1 - 7 ) AT (fl*; ^f-lj^ ■ (16) 

Here, $ = Sj is the current empirical estimate of the covariance between the parameters of 6, 
estimated using samples from the Markov chain up to time j — 1. The theoretical motivation for 
the choices of scale factors 2.38, 0.1 and dimension d are all provided in [49] and are based on 
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optimality conditions presented in [48]. We note that the update of the covariance matrix, can be 
done recursively on-line via the following recursion (as detailed in [5]): 



S i+ i - Ej + — - 







(17) 



5 Synthetic Data Analysis 

In this section we first study the performance of the AdMCMC Forward simulation methodology 
in a synthetic data example. To perform this study we utilised model parameters randomly selected 
as: 

(TRm, TRf, WIPm, WIPf, DWTm, DWTf, DAIm, DAIf, PSCm, PSCf, 
S, A Y , EPSa, EPSr) = (0.9, 0.9, 0.95, 0.85, 0.15, 0.3, 3.2, 3.4, 0.5, 0.6, 5.0, 2.0, 0.5, 0.5) . (18) 

and distributed as shown in Table 5. Note that Dim and DIf are omitted because we decided to 
assume a life-long immunity for all individuals in the population. This simplification can easily be 
removed and would not modify our estimation procedure or models developed. 

With an initial population size of 10,000 we simulated trajectories of system (2)-(6) over annual 
time steps for 120 years (t E [0, 120]) and every 10 years we calculated the synthetically generated 
"true" state and a noisy set of observations for each age-group under the specified statistical models 
in Equation 10. These ODE state trajectories and the observations were taken as the "true" syn- 
thetic data. Next we ran 100,000 iterations of the AdMCMC Forward Projection sampler to obtain 
samples from the posterior distribution conditional on these synthetically generated observations. 

In Figure 2 we present the posterior estimates for the ODE epidemic static model parameters 
(calibration performance) under the Bayesian model constructed in Section 3. We illustrate this 
performance for the model parameters that describe the rates of transmission between states as 
well as those that quantify the statistical uncertainty we model in the sexual mixing matrix. Each 
of these parameters is estimated based on observations generated for the number of new diag- 
noses (incidence) and seroprevalence utilising prior specifications derived from previous literature 
studies. The important features of this synthetic study is that we can illustrate that our Adaptive 
MCMC Forward Projection sampling methodology is able to generate samples from the resulting 
high dimensional posterior efficiently and that the estimated posterior parameter MMSE values and 
95% posterior C.I. contain, in all cases, the "true" model parameters utilised to generate the data. 
These results demonstrate that the AdMCMC Forward project estimation procedure we developed 
is working accurately in this controlled synthetic study. This is also confirmed by the estimation 
of state trajectories. An example of estimated state trajectories for the total number of males in 
every disease state is shown in Figure 3. For all simulated time steps and for each disease state 
we observe that the "true" trajectory is contained within the 95% posterior confidence interval for 
the aggregated trajectory and, in addition, the estimated mean trajectory is in agreement with the 
"true" aggregated trajectory. 

Furthermore, we can also provide analysis of the calibration performance as a function of the 
observed data versus estimated model fits to incidence and seroprevalence per age-group. The 
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95% posterior confidence intervals of the simulated forecast posterior calibrations are compared to 
and found in good agreement with the "true" synthetically generated observations, as depicted in 
Figures 4 and 5, for the number of new genital warts diagnoses and the seroprevalence by age and 
gender. 



1.4r 




TRm TRI WIPm WIPf DWTm DWTf DAIm/5 DAII/5 PSCm PSCf EPSa EPSr 



Figure 2: "True" model parameters (gray circles) for synthetic study versus estimated model pa- 
rameters (light gray circles) and error bars (95% posterior C.I.) for the model corresponding to the 
combined HPV-6 and -11 case. 



6 Real Data Analysis and Results 

In this section we undertook statistical estimation and performed calibration of the ODE model 
to the real data observations for HPV-6, HPV-11 and HPV-6 and -11 combined. We maintained 
the assumption that the duration of immunity is life-long for all individuals, since we found that 
as immunity was chosen closer to permanent the calibration became consistently better. This was 
measured with respect to the predictive performance of the posterior MMSE and associated pos- 
terior 95% C.I. obtained from the model for each age-group and gender for seroprevalence and 
number of new diagnoses (incidence). 

The first results we consider are for HPV-6 ( Fig. 6). The seroprevalence calibration is good 
for all age-groups except the youngest males and females in age-groups 4 and 5. At this point it 
is appropriate to note that all data recordings and observations we obtained for the males in age- 
group 1 can be questioned. We are inclined to think that the reported numbers do not represent the 
real-life situation since they suggest that both seroprevalence and incidence are much lower among 
males in this age-group compared to the females of the same age. Considering the reported sexual 
partner acquisition rates (see Table 8), which are the same for males and females in age-group 1, 
there is no apparent reason to believe this should be true and the reported numbers may simply 
be due to young males being reluctant to seek medical assistance. As for the females aged 30-39, 
their high seroprevalence level was not captured by the model with the prior distributions specified 
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Figure 3: State trajectories for males in every disease state (synthetic case, combined HPV-6 and 
-11): "true" trajectories (black line), MMSE (gray line) and 95% CI (shaded area); note that in the 
presented case the trajectories for seropositive and seronegative males coincide, so it is convenient 
to call males in either state "recovered". 



as in Table 5. Regarding incidence, we managed to obtain a reasonable fit for both females and 
males. The calibration for HPV-11 (Fig. 7) was accurate for seroprevalence (note that this is not 
the case for the youngest males and females though) and less accurate for incidence, especially 
when considering males. A similar result was observed for the combined HPV-6 and -11 data (Fig. 
8). 
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Figure 4: Calibration for synthetic case at time point t = 30 (combined HPV-6 and -11): MMSE 
(solid line) and 95% CI (shaded area). 



7 Impact of Vaccination via the Posterior Predictive Distribu- 
tion 

In this section we demonstrate how to further develop the modelling framework provided for the 
calibration of the models developed for HPV-6 and -11 to incorporate a statistical analysis of the 
impact of vaccination. Having undertaken a calibration, via Adaptive MCMC forward projection 
methodology, presented in Section 4, we obtain an empirical estimation of the posterior distribution 
for the model parameters presented in Table 3 and the associated posterior estimates for the states 
at time T when the last observation was taken prior to a vaccination treatment. This information 
can be represented by the empirical estimation of the marginal posterior distribution: 

p (TR, WIP, GWT, DAI, PSC, DI, £, A Y , EPSa, EPSr, X, IM JT) O IIM J\). O g , a>s (T)) , 

using the N samples from the Markov chain generated by the Adaptive MCMC Forward Projec- 
tion algorithm. This empirical distribution can then be utilised to obtain the posterior predictive 
distribution, given by the resulting Monte Carlo approximation: 
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Figure 5: Calibration for synthetic case at time point t = 120 (combined HPV-6 and -11): MMSE 
(solid line) and 95% CI (shaded area). 



p(O g ^ s (T + 1), . . . , O w (T + r)\O g;a>s (l), O g;a>s (T)) = 

J p(O g , a , s (T + 1), . . . , O g>a>s (T + r)|TR, WIP, GWT, DAI, PSC, DI, S, Ay, EPSa, EPSr, 

X W (T), . . . , X g ^ s (T + r))xp (TR, WIP, GWT, DAI, PSC, DI, E, Ay, EPSa, EPSr, 
X gAsS (l), X w (T)|O w (l), . . . , O g , a>a (T)) dTR rfWIP dGWT dDAI dPSC dm 

1 J 

dZ dA Y dEPSa dEPSr = -^p(O w (l), . . . , O g ^ s (T) | {TR, WIP, GWT, DAI, PSC, DI, 

J 3 = 1 

E, Ay, EPSa, EPSr, X gA>s (l), X W (T)} J ') . (19) 

Here the quantities 

p(O w (l), . . . , 9 , a , s (T)|{TR, WIP, GWT, DAI, PSC, DI, E, Ay, EPSa, EPSr, 

X W (1),...,X W (T)}') 

are obtained using the samples from the Adaptive MCMC Forward Projection calibration, then 
used to forward project the ODE solver to obtain estimates of O g ^ s (T + 1), . . . , O g ^ a , s {T + r), 
conditional on sampled parameters and states at time T. 
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Figure 6: Calibration for HPV-6: MMSE (solid line) and 95% CI (shaded area). 



The results of this analysis are presented in Figure 9. Here we present the posterior predictive 
distribution and predictive 95% posterior confidence intervals for the incidence of HPV-6 and HPV- 
1 1 combined in the population post a vaccination. The vaccination scheme, commenced at the time 
T + 10, assumes that 80% of all 15-19 y.o. previously not vaccinated females (age-group 1) receive 
a single dose of vaccine each year. The effect of the vaccine is exclusively in reducing the force of 
infection on a vaccinated individual by 90%, comparing to that on an individual belonging to the 
same age and sexual activity-group who has not been vaccinated. 

The results presented in this section have demonstrated how to incorporate the information that 
was obtained from the posterior calibration of the HPV-6 and HPV-1 1 models to observed data, to 
predict the vaccine response post calibration to a given equilibrium in the epidemic model within 
the population. They demonstrate the resultant new equilibrium level for incidence of HPV that 
would arise post vaccination, and the time taken to reach a new equilibrium level post- vaccination. 



8 Conclusion 

In this paper we developed a novel combination of a compartmental HPV-6/- 1 1 ODE transmission 
model and a Bayesian statistical modelling framework. We investigated the possibility of cali- 
brating the model to available seropositivity [42] and genital warts incidence [43] data. We then 
demonstrated how to perform posterior predictive inference related to impact of vaccination on the 
modelled population. 



22 




Figure 7: Calibration for HPV-1 1: MMSE (solid line) and 95% CI (shaded area). 



The estimation of the ODE model was achieved via adaptive Markov chain Monte Carlo based 
sampling methodology based around adaptive Metropolis coupled with a forward projection ODE 
solver to perform the joint challenge of sampling the static model parameters together with the 
estimated latent ODE states from transience to equilibrium. This was required in order to perform 
the evaluation of the rejection stage of the MCMC algorithm. We demonstrate the suitability of 
this sampling and estimation methodology and discuss the practical application of this estimation 
procedure. In the process we illustrate the capability of the proposed estimation procedure to 
perform calibration of these very high dimensional models for Australian data on observations 
of genital warts incidence and seroprevalence. We then develop a novel Bayesian perspective on 
studying the estimated vaccine response from such a model calibration, based around the posterior 
predictive distribution of the equilibrium states of the HPV epidemic model after vaccination. 

In each case our aim was to ensure that the models developed were both parsimonious with 
respect to the number of parameters utilised to parametrize the ODEs describing the evolution of 
the disease epidemic for HPV-6 and -11 and sufficiently flexible to statistically calibrate the ODE 
compartmental model to the observed number of new genital warts diagnoses and seroprevalence 
per age-group. 

In addition, we maintain that the introduction of the approach we develop for modelling sero- 
prevalence and its interpretation in the context of the poorly understood implications of serocon- 
version are in line with now popular views relating to HPV epidemic modelling. In particular, we 
assumed that only a person who recovered from either an asymptomatic HPV infection or genital 
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Figure 8: Calibration for combined HPV-6 and -11: MMSE (solid line) and 95% CI (shaded area). 



warts, and is not susceptible, can be seropositive. 

An attempt to calibrate to the seroprevalence data was of interest because there were no such 
data available for Australia until they were presented in [42]. Additionally, we were not aware of 
any published models calibrated to both seroprevalence and genital warts incidence data. Taking 
into account that all the data are necessarily prone to various errors we considered a calibration pro- 
cess successful if the simulated observation values were found within the reported 95% confidence 
interval for the real data. 

We have demonstrated that a reasonable calibration for the ODE epidemic model under consid- 
eration could be achieved only if some of the model parameters were allowed to take values beyond 
the ranges specified for them using currently available literature. We believe that this may be due 
to one or any combination of the following reasons: an insufficient amount of HPV natural history 
detail incorporated in the model, poor quality of the available data and the uncertainty in interpreta- 
tion of these data in the context of compartmental ODE models. Also, we have found that a notably 
better calibration can be observed if we change the assumptions we made about seropositivity in 
favour of a somewhat controversial idea that the seropositive status should persist regardless of an 
individual's disease state, and consider relatively short (up to 10 years) durations of immunity for 
both males and females. However, in this case the ODE system we had to deal with appeared to 
be stiff and the solver we used was significantly slower to complete solutions. An improvement to 
the quality of calibration can be ensured by the introduction of different durations of immunity for 
different age-groups. One concern with this extension to the model is the parsimony of the model, 
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Figure 9: Posterior predictive distribution for vaccine response: MMSE (solid line) and 95% CI 
(shaded area). 



since the introduction of separate durations of immunity for all age-groups involves the necessity 
to deal with at least nine new parameters with non-informative prior distributions (or 18 new pa- 
rameters if we want to improve the calibration further and assume different durations of immunity 
for males and females). Not only would one question in such settings the ability to accurately 
estimate such parameters in practice due to the observational data properties and the amount of 
observation data, but there would also be important statistical questions to be addressed relating to 
over parameterization of the model, over fitting of the model and parsimony. Since the intention 
of this work was to utilise parsimonious and interpretable models, we feel the model assumptions 
introduced were warranted to illustrate the properties of our estimation methodology. In addition, 
the extension of this methodology to working with additional parameterizations is trivial and easily 
incorporated into our estimation framework. Therefore we decided, based on the observed data 
under analysis, that it was prudent to maintain the parsimonious model representation developed. 

On the other hand, we would like to emphasize that our proposed estimation methodology 
based around adaptive MCMC forward projection is automated and easily extended to facilitate any 
number of additional model parameters. This is in contrast to other approaches already proposed in 
the literature which may suffer from the curse of dimensionality under such parameter extensions. 
For example, had we utilised a basic Metropolis-Hastings sampler with a non-adaptive proposal 
mechanism, and not the adaptive Metropolis methodology we developed for this solution, then the 
tunning of the resulting transition kernel would be affected by such changes and would have to 
be performed again under each such model change. This would entail a significant computational 
burden in performing the model calibration under each change. In addition it may result in a serious 



25 



practical hurdle to overcome in re-design of the proposal mechanism in the Metropolis-Hasting 
sampler in order to facilitate efficient mixing in such high dimensional sampling frameworks. 

We also note that our method is favorable relative other approaches which are not Markov 
chain based solutions. For example, Latin Hyper Cube sampling solutions which calibrate epi- 
demic ODE models using grids over regions of the parameter space will suffer when the parameter 
space is significantly increased, see examples in HIV epidemic models such as ??. We note that un- 
der such approaches, an increase in the parameter space by an additional m dimensions to represent 
the new model parameterization would require a grid be placed over an additional m dimensional 
space, such significant increases in dimension may cause such approaches to suffer a curse of di- 
mensionality attributed to the super-linear (exponential) increase in volume associated with adding 
the extra m dimensions to the parameter space. For each such grid point, such techniques would 
then require a complete iteration of the ODE solver to stationarity, incurring significant additional 
computational cost for such model changes, even when sparse grid based techniques are used. Of- 
ten incurring this additional computational cost for little benefit as regions of the parameter space 
of little significance to the most likely calibration points must be explored. This is in contrast to 
our adaptive MCMC based procedure which focus computational effort on regions of the parameter 
space proportionally with how probable, with respect to the posterior support, they are to contribute 
to suitable calibrations of the ode epidemic model. 

In addition we mention that since LHC schemes do not straightforwardly facilitate a Bayesian 
sampling paradigm, and any probabilistic selection of grid points in the additional parameter space 
can be at best selected under such schemes based around prior information and perhaps heuristics 
on a goodness of fit criterion, then one readily realises the utility and flexibility of our proposed 
Bayesian framework and Adaptive MCMC Forward Projection methodology. This methodology is 
flexible in any dimension and under any model change, learning on-line the appropriate parameter 
subspaces to focus computational effort as dictated by a statistically consistent combination of both 
prior belief on rates of population movement between states and observed evidence. Therefore we 
are able to clearly quantify the probability weighted distribution of plausible calibration regions 
in the high dimensional parameter space using the samples we obtained from the posterior con- 
structed. This is in contrast to previous literature which tries to get such information heuristically 
from procedures such as sensitivity analysis of the ode structure to different parameter regions, 
again incurring significant computational effort. Furthermore, at best such approaches can only 
produce what is termed the "variability or the prediction imprecision" in the outcome variables 
that is due to the uncertainty in estimating the input values. It is important to note that such quan- 
tifications are not statistically based measures of prediction uncertainty, where as the predictive 
performance we develop under our Bayesian framework can be directly interpreted as the predic- 
tive distribution of the model, again providing advantages of our proposed Bayesian estimation 
approach in interpretation of the results. 
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Appendix 1. Sexual mixing matrix 

Sexual mixing matrices are widely used in modelling of sexually transmitted diseases and their 
main purpose is to describe how certain characteristics of an individual define his or her sexual 
activity (i.e. the rate of sexual partner change). These characteristics typically are the age and/or 
sexual activity-group an individual belongs to. 

Notations and available data 

From now on we will denote a gender by g, the gender opposite to g by g', a sexual activity (risk) 
group an by s or s' and a or a' will be an age-group. Also, when referring to an individual of gender 
g who belongs to a sexual activity-group s and age-group a we will simply say "an individual from 
g, s, a" for brevity. 

In the equations describing our model we have a force of infection term for every individual 
from g, s, a: 

i Ig',s',a' 1 ^20) 

Sg>,s',a> + Ig',s',a' + Gg',s',a> + Pg',s',a' + ^g',s',a' J 

where (3 g is the transmission probability per partnership, i.e. the probability that a susceptible per- 
son of gender g will get infected from his or her infectious partner of gender g'\ c fl|S)S ' )0;a / is the 
mean per capita rate per year at which an individual from g,s,a acquires new sexual partners from 
g', s', a'; p g>s> s' >a> a' is the conditional probability that an individual from (g, s, a) forms a partner- 
ship with someone from g', s', a'. Products Cg ;SiS > >a>a 'p giS>s t ;aia > are the rates of new sexual partner 
acquisition and they can be conveniently gathered in a matrix which we call a sexual mixing matrix. 

Now we will start constructing a sexual mixing matrix for our model according to the approach 
by Garnett and Anderson [23] and its version used in [18]. In the process of construction we should 
use the relevant data estimated from the results of Australian Study of Health and Relationships 
(ASHR). ASHR was a telephone survey of a random sample of about 20,000 people who were 
Australian residents aged from 16 to 59 y.o. Despite its limitations (see [53] and [45] for discussion) 
this survey provided important information on sexual behaviour of Australians and its results are 
currently the most representative ones available. The data we need (as suggested in [23], [21] or 
[24]) are the following: 

• relative sexual partner acquisition rates r a for each age-group a (the same for males and 
females; see Table 8) 

• relative sexual partner acquisition rates r s for each sexual activity (risk) group s (the same 
for males and females; Table 9) 

• overall sexual partner acquisition rate c for the whole population (both males and females): 

c = 0.437. 
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All these rates are per capita per year. If for all g,s,a we knew the rates at which an individual 
from g,s,a acquires new sexual partners from the entire pool of individuals of gender g' (denote 
them Cg )Sia ), we could find the lowest of these rates, 

Cmin • »yj S, (X C m i n ^ Cg jSj(I , 

and divide by it all c g>s ^ a denoting the results of division by r ffiS)0 : 

r,,„ = (21) 



These would be the relative sexual partner acquisition rates for g,s,a. We can specify them as 
follows 

r g , s ,a = r a x r s Wg,s,a. (22) 

We can now calculate c m i n . Note that cN g is the number of partners of gender g' all individuals 
of gender g acquire per year. This number must be equal to the total of sexual partners of gender g' 
individuals of gender g from each age and sex group acquire, so 

^ ^ ^g,s,a-^g,s,a CminTg ,1 ,1 ^g,l,l ^mi'nT g, 1,2-^ g,X ,2 ~\~ ■ ■ ■ C m j n ^ ^ ^g,s,a-^g,s,a- 



Hence, 



cN g 



Z^s,a ' 9,s,a 1 ^g,s,a 

also, with c min at hand we can easily calculate all c 9jSiQ (see (21)). 



(23) 



Proportionate and assortative mixing 

Now we would like to specify p g , s ,s',a,a'i which are the conditional probabilities that an individual of 
gender g from sexual activity-group s and age-group a gets a sexual partner of the opposite gender 
g' from sexual activity-group s' and age-group a'. There are three possibilities to consider: 

1. so-called "proportionate" sexual mixing by age-group or sexual activity-group; this means 
that Pg >S) s',a,a' ma Y be assumed equal to the proportion of partnerships "generated" by indi- 
viduals of gender g' from age-group a' (and/or sexual activity-group s): 



p qss , aa ,= — ' r ^'"'- V ' / — ' (24) 

fg,s,s ,a,a ■/ 
2^s>,a> Cg',s',a'Mg',s',a> 

or 

_ Eg' Cg',s',a'N g/}S >, a ' 
Pg,s,s',a,a' — ^ AT ■ \^->) 

Z^sV Cg>,s>,a>Mg>,s>,a> 

In other words, an underlying assumption here is that more active members of gender g' 
have higher chances to become a new sexual partner to someone of gender g. Note that in 
case of proportionate mixing by both age and sexual activity-group we should simply define 
Pg,s, s ',a,a' and a product of the right-hand sides of (24) and (25). 
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2. "assortative" mixing (also known as "with-like" mixing); again, this can be by age or sexual 
activity-group or both, and if it's, for example, by age-group, we define 

Pg,s,s',a,a> = ^aa'- (26) 

That is, the probability of establishing a partnership is 1 if a potential partner is of the same 
age and otherwise. 

3. "disassortative" mixing ("with-unlike") suggests that a partnership can be formed only with 
someone from a different age (or sexual activity) group. 

We would like to be able to adjust the sort of mixing in our model depending on our needs. 
Therefore, we specify the probabilities p g , s ,s',a,a' as follows 

P 9 , s ,s',a,a> = f(l - EPSa)6 aa , + EPSa ) x 

'1 - EPSr)5 ss , + E PSr ^ a ' C9 '' s '' a ' N °'' s '' a ' . ) (27) 

2^ s ',a' c g',s',a'N g/)S ^ a i J 

Parameters EPSa and EPSr help us vary the extent of assortativeness by age and sexual 
activity-group: if EPSa = mixing is fully assortative by age-group; if EPSa = 1 it is fully 
proportionate by age-group. In a similar fashion we can vary EPSr and control assortativeness by 
sexual activity-group. 

Age related adjustments 

Here we introduce some adjustments to emphasise the effect of a steady popularity of the partner- 
ships between older males and younger females. Let for now g denote males and g' females. We 
reduce the probabilities that males in the age-group 3 and higher have female partners of the same 
age: 

Pg,s,s',a,a' ~^ Pg,s,s> ,a,a> (1 ~ T) if < > ^ (28) 



Similarly, we reduce the probabilities that females in the age-groups 1 to 5 form a partnership with 
males of the same age: 



a = a' 



Pg',s,s',a,a' — > Pg> ,s,s' ,a,a> (1 _ T) if ^ Q < 5 

To compensate for the effect of these adjustments we increase the probabilities for males to have a 
female partner one age-group younger but belonging to the age-groups not higher than 5, 

, -r ( a = a' + 2 

Pg,s,s',a,a> ~^ Pg,s,s',a,a' + J- Pg,s,s',a,a U <> / < g 

and also we increase the probabilities that female have one age-group older males from the age- 
group 3 or higher as a partner: 

f a = a' - 2 

Pg',s,s',a,a> ~ > Pg' ,s,s> ,a,a' + J- Pg',s,s',a,a H < , ^ (Jlj 
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So far we have used the rates c g>Sja which describe new sexual partner acquisitions by individ- 
uals from g,s,a from the whole population of gender g'. So to find out the rates of acquisition of 
new sexual partners from a certain age and sexual activity-group (a 1 and s') we should multiply 
c g ,s,a by p g ,s,s',a,a'- However, now we would like to make the rates c g)S>a group-specific in terms of 
the groups the sexual partners are selected from. This will be achieved via balancing supply and 
demand of sexual partners. 

Balancing supply and demand 

We want the following to hold for all g, s, a and g', s', a': 

Cg,s,aPg,s,s' ,a,a' ^g,s,a Cg 1 ,s' ,a' Pg' ,s' ,s,a' ,a^g' ,s' ,a' ■ (32) 

Here c gtS>a pg jS)S > taja / is the rate of acquisition of new sexual partners by individuals who belong to 
g, s, a from g', s', a', and c g>Sta pg tS>s r jata 'N gtSja is the total number of new partners acquired by g, s, a 
from g',s',a'. So the equation simply states that the total number of new partners acquired by g, s, a 
from g', s', a' must be equal to the total number of new partners acquired by g',s', a' from g,s,a. 
Let 

^g,s,aPg.s,s' ,a.a' g,s,a 7^ ^g' ,s' ,a' Pg' ,s' ,s,a' .a-^g' ,s' ,a' • 

We want to find such a multiplier B that 

B Cg t s,aPg,s,s' ,a,a' g,s,a B Cg' js' ,a' Pg' ,s' ,s,a' ,a-^g' ,s' ,a' • 

Then 

j^6i-6 2 C 9',s',a'Pg',s',s,a',a-^g',s',a' 

Cg,s,aPg,s,s' ,a,a' ^g,s,a 

To keep things simple, let 9 1 — 9 2 = 1. Note that B serves as a degree of imbalance: the balance 
is established if B = 1. So, to ensure that (32) is true it is enough to introduce the group-specific 
rates c S)S)S / ;(Iia / to be used instead of c 9jSia : 

Cg,s,s' ,a,a' Cg,s,a,B , Cg' jS ' jSjCl ' ja Cgi )S ' a ' B (33) 

We limit the range of value of parameter 9\ to [0, 1]. Suppose the supply and demand are not 
balanced and 61 = 0. Then as follows from (33), c ff|S)S / j0)(I ' = c 9 S a (the rates for gender g do not get 
adjusted), but c s / |S / jS>a ' ja = c g / )S ' j0 'B _1 (the rates for gender g' are adjusted). If 6i = 1 it is the other 
way around. Consequently, 6\ indicates to what extent individuals of each gender adjusts their 
sexual partner acquisition rates (i.e. we may say, sexual behaviour) in case the available supply of 
sexual partners does not meet demand. 

At this stage our sexual mixing matrix is fully formed. 
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Disease State Description 



S ' Susceptible' ': an individual who is at risk of getting infected; 

J 'Infected': an individual with type 6 or 1 1 HPV; 

G 'Genital warts': an individual who developed genital warts and is being treated; 

P 'Seropositive': an individual who is recovered and seropositive; 

N 'Seronegative': an individual who is recovered and seronegative. 



Table 1: Compartments defined in the HPV-6/-11 transmission model according to disease states 
within a population. 
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Coefficient Coefficient Interpretation 



\,s,a(t) Force (risk) of infection - a per capita rate of 

becoming infected per unit time (a year in our model); 

r g Recovery rate for a patient who is being treated; approximately equal to 

1 /(average duration of treatment for genital warts); 

p g Recovery rate for an asymptomatic individual; approximately equal to 

1 /(average duration of an untreated HPV infection); 

u g Probability that an individual becomes seropositive; 

( g Rate at which an individual loses immunity; approximately equal 

to 1 /(average duration of immunity); 

7 5 Genital warts development rate; approximately equal to 

1 /(average time period between becoming infected and detection of genital warts). 

Table 2: Interpretations of the coefficients in the HPV-6/-1 1 model equations. 



Parameter 


Interpretation 


Prior 


Non-linear ODE Model Parameter Priors 


Transmission probability (males) 


TRm 


U[Ba, Bb] 


Transmission probability (females) 


TRf 


U[Ba, Bb] 


Average genital warts incubation period (males) 


WIPm 


Ga(k WIPm , 9wiPm) 


Average genital warts incubation period (females) 


WIPf 


Ga(kwipf, OwiPf) 


Average duration of genital warts treatment (males) 


DWTm 


Ga(knWTm, @DWTm) 


Average duration of genital warts treatment (females) 


DWTf 


Ga(kDWTf, & DWTf) 


Average duration of asymptomatic HPV infection (males) 


DAIm 


Ga(kDAIm, Odaiw) 


Average duration of asymptomatic HPV infection (females) 


DAIf 


Ga{koAif, 9 DAif) 


Average duration of immunity (males) 


Dim 


U (knim, 9 Dim) 


Average duration of immunity (females) 


DIf 


U(k DIf ,9 DIf ) 


Probability of seroconversion (males) 


PSCm 


Be(apscm, Ppscm) 


Probability of seroconversion (females) 


PSCf 


Be(a PS cf, Ppscf) 


Observation Error Parameters Priors 


Diagonal element of the observation error covariance matrix 






for incidence £ = diag(a) 


a 


invGa(k a , 9 a ) 


Observation error scale for seroprevalence 


A Y 


Ga(k AY ,0 AY ) 


Mixing Matrix Parameter Priors 


Degree of assortativeness by age-group 


EPSa 


Be(a ea ,(3J 


Degree of assortativeness by risk-group 


EPSr 


Be(a €r ,(3 €r ) 



Table 3: Prior specification for non-linear-ODE models for HPV epidemic. Note that the non-linear 
ODE model parameter priors can all be assumed non-informative. 
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males 



females 



duration (in years) of 


value 


95% CI 


source 


value 


95% CI 


source 


Incubation period, median 


0.916 


0-1.341 


[4] 


0.241 


0-0.475 


[60] 


Treatment, mean 


0.281 


0.213-0.349 


[33] 


0.232 


0.185-0.280 


[33] 


Treatment, median 


N/A 


N/A 


N/A 


0.491 


0.325-0.666 


[33] 


Untreated infection, median 


0.45 


0.425-0.475 


[29] 


0.5 


0.475-0.575 


[56] 


Untreated infection, mean 


N/A 


N/A 


N/A 


0.7916 


0.575-1.0 


[56] 



Table 4: Data from medical literature which aid in specification of prior hyper-parameter values for 
the HPV-6/-11 model. 







WIP 


DAI 


DWT 


6 


males 
females 


[7(0.9,1.3) 
£7(0.6,0.9) 


[7(2.2,3.6) 
[7(2.2,3.6) 


Ga(92.00, 0.003) 
Se(69.00, 231.00) 


11 


males 
females 


C/(0.9,1.3) 
[7(0.6,0.9) 


[7(2.0,3.6) 
[7(2.0,3.6) 


Ga(92.00, 0.003) 
56(69.00,231.00) 


6/11 


males 
females 


[7(1.0,2.0) 
[7(1.0,2.0) 


[7(3.8,4.8) 
[7(3.8,4.8) 


Ga(92.00, 0.003) 
56(69.00,231.00) 



Table 5: Selected prior distributions actually used in our simulations. While DWT is in agreement 
with the real data, WIP and DAI were allowed to take values beyond their reported ranges (see 
Table 4). 







HPV-6 






HPV-11 




combined HPV-6 and -11 




males 


females 


males 


females 


males 


females 


no. 


% 


95% CI 


% 


95% CI 


% 


95% CI 


% 


95% CI 


% 


95% CI 


% 


95% CI 


1 


0.6 


0.0-3.3 


7.0 


3.4-12.6 


1.2 


0.1-4.3 


1.4 


0.2-5 


1.2 


0.1-4.3 


7.7 


3.9-13.4 


2 


9.6 


5.9-14.4 


12.2 


7.5-16.9 


7.2 


4.1-11.6 


4.9 


2.1-7.7 


13.4 


9.1-18.8 


18.2 


13.6-23.6 


3 


9.6 


5.9-14.4 


17.7 


13.0-22.4 


7.2 


4.1-11.6 


4.0 


1.2-6.8 


13.4 


9.1-18.8 


18.2 


13.6-23.6 


4 


15.1 


10.1-21.4 


22.0 


17.6-27.1 


7.6 


4.1-12.6 


6.4 


3.9-9.7 


17.4 


12.1-24.0 


23.6 


19.0-28.7 


5 


15.1 


10.1-21.4 


22.0 


17.6-27.1 


7.6 


4.1-12.6 


6.4 


3.9-9.7 


17.4 


12.1-24.0 


23.6 


19.0-28.7 


6 


15.4 


9.9-22.4 


18.8 


14.4-23.7 


9.1 


4.9-15.0 


11.8 


8.3-16.1 


20.3 


14.0-27.8 


24.0 


19.1-29.3 


7 


15.4 


9.9-22.4 


18.8 


14.4-23.7 


9.1 


4.9-15.0 


11.8 


8.3-16.1 


20.3 


14.0-27.8 


24.0 


19.1-29.3 


8 


12.9 


8.0-19.4 


17.7 


12.1-24.6 


7.5 


3.8-13.0 


4.4 


1.8-8.9 


15.6 


10.2-22.5 


19.0 


13.2-26.0 


9 


12.9 


8.0-19.4 


17.7 


12.1-24.6 


7.5 


3.8-13.0 


4.4 


1.8-8.9 


15.6 


10.2-22.5 


19.0 


13.2-26.0 



Table 6: Seroprevalence data for Australia as percentages of the whole male or female population 
(as reported in [42]). 
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age-group males females 



no. 


ages, y.o. 


mean 


95% CI 


mean 


95% CI 


1 


15-19 


1.66 


0.46-2.86 


7.28 


4.18-10.38 


2 


20-24 


6.27 


3.77-8.77 


8.61 


5.61-11.61 


3 


25-29 


7.4 


4.4-10.4 


6.37 


3.77-8.97 


4 


30-34 


4.64 


2.44-6.84 


4.33 


2.13-6.53 


5 


35-39 


2.34 


1.34-3.34 


2.19 


1.19-3.19 


6 


40-44 


2.34 


1.34-3.34 


2.19 


1.19-3.19 


7 


45-49 


0.83 


0.33-1.33 


0.48 


0.08-0.88 


8 


50-54 


0.83 


0.33-1.33 


0.48 


0.08-0.88 


9 


55-59 


0.83 


0.33-1.33 


0.48 


0.08-0.88 



Table 7: Genital warts incidence data for Australia measured as the mean number of new cases per 
1000 persons per year (as reported in [43]). 



age-group 


no. 


1 


2 


3 


4 5 6 


7 


8 


9 


age, y.o. 


15-19 


20-24 


25-29 


30-34 35-39 40-44 


45-49 


50-54 


55-59 


rate, r a 


5.28 


6.06 


4.37 


2.57 1.61 1.43 


1.00 


1.00 


1.00 



Table 8: Relative new sexual partner acquisition rates by age-group 



sexual activity-group 


1 


2 


3 


4 


population in each group, % 


60 


27 


11 


2 


rate, r s 


1.00 


4.76 


24.83 


105.67 



Table 9: Relative new sexual partner acquisition rates by sexual activity-group 
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